On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition
نویسندگان
چکیده
We establish connections between the problem of learning a two-layers neural network with good generalization error and tensor decomposition. We consider a model with input x ∈ R, r hidden units with weights {wi}1≤i≤r and output y ∈ R, i.e., y = ∑r i=1 σ(〈x,wi〉), where 〈·, ·〉 denotes the scalar product and σ the activation function. First, we show that, if we cannot learn the weights {wi}1≤i≤r accurately, then the neural network does not generalize well. More specifically, the generalization error is close to that of a trivial predictor with access only to the norm of the input. This result holds for any activation function, and it requires that the weights are roughly isotropic and the input distribution is Gaussian, which is a typical assumption in the theoretical literature. Then, we show that the problem of learning the weights {wi}1≤i≤r is at least as hard as the problem of tensor decomposition. This result holds for any input distribution and assumes that the activation function is a polynomial whose degree is related to the order of the tensor to be decomposed. By putting everything together, we prove that learning a two-layers neural network that generalizes well is at least as hard as tensor decomposition. It has been observed that neural network models with more parameters than training samples often generalize well, even if the problem is highly underdetermined. This means that the learning algorithm does not estimate the weights accurately and yet is able to yield a good generalization error. This paper shows that such a phenomenon cannot occur when the input distribution is Gaussian and the weights are roughly isotropic. We also provide numerical evidence supporting our theoretical findings.
منابع مشابه
On the convergence speed of artificial neural networks in the solving of linear systems
Artificial neural networks have the advantages such as learning, adaptation, fault-tolerance, parallelism and generalization. This paper is a scrutiny on the application of diverse learning methods in speed of convergence in neural networks. For this aim, first we introduce a perceptron method based on artificial neural networks which has been applied for solving a non-singula...
متن کاملEffect of sound classification by neural networks in the recognition of human hearing
In this paper, we focus on two basic issues: (a) the classification of sound by neural networks based on frequency and sound intensity parameters (b) evaluating the health of different human ears as compared to of those a healthy person. Sound classification by a specific feed forward neural network with two inputs as frequency and sound intensity and two hidden layers is proposed. This process...
متن کاملHandwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کاملAn Analysis of the Connections Between Layers of Deep Neural Networks
We present an analysis of different techniques for selecting the connection between layers of deep neural networks. Traditional deep neural networks use random connection tables between layers to keep the number of connections small and tune to different image features. This kind of connection performs adequately in supervised deep networks because their values are refined during the training. ...
متن کاملGrowing Arti cial Neural Networks Basedon
| With this paper we propose a learning architecture for growing complex ar-tiicial neural networks. The complexity of the growing network is adapted automatically according to the complexity of the task. The algorithm generates a feed forward network bottom up by cyclically inserting cascaded hidden layers. Inputs of a hidden layer unit are locally restricted with respect to the input space by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.07301 شماره
صفحات -
تاریخ انتشار 2018